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LDAP (Lightweight Directory Access Protocol) is 
a technology that can provide directory services 
to a range of applications. Directory service, a 
critical part of distributed computing, is the 
central point where network services, security 
services, and applications can form an integrated 
distributed computing environment. The 
simplicity of LDAP enables users to store and 
retrieve data easily from the directory. 
Nevertheless, as the use of directory services 
becomes more widespread, directories will need 
to scale to support millions of entries and 
millions of user requests with subsecond 
predictable performance. LDAP directories 
can be implemented using various storage 
mechanisms such as flat files, b-trees, or 
databases. This paper discusses an 
implementation of LDAP that uses the IBM 
DATABASE 2™ relational database as the data 
store and query engine to meet the directory 
service requirements. Performance analysis is 
provided to show that a relational database can 
be used to successfully meet the performance 
and scale needs of an LDAP directory while 
remaining secure and competitive with other 
vendor implementations. 



Directories are special-purpose databases, usu- 
ally containing categorized information to sup- 
port frequent data retrieval and data update. 1 
Directory service, a critical part of distributed 
computing, is the central point where network ser- 
vices, security services, and applications can form an 
integrated distributed computing environment. The 
current usage of a directory service can be classified 
into the following categories: 



• Name service — Use directory as a source to locate 
Internet host address or the location of the server. 
Examples are Domain Name System (dns) and 
DCE (Distributed Computing Environment) Cell 
Directory Service (CDS). 

• User registry — Store information of all users in a 
system. A central repository of user information 
will enable the system administrator to adminis- 
ter the distributed system as a single system im- 
age, especially if the system is composed of a num- 
ber of interconnected machines. Novell Directory 
Services** (nds**) is an example. 

• White pages lookup — Some modern e-mail clients 
provide users with the capability of looking up peo- 
ple's names and e-mail addresses. The users type 
in the name, or part of the name, and the direc- 
tory service will extract the e-mail information for 
the user. Netscape Communicator**, Lotus 
Notes**, Eudora**, and other e-mail clients pro- 
vide the address book lookup capability. 

With more and more applications and system ser- 
vices demanding a central information repository, 
the next-generation directory service will be provid- 
ing system administrators with a data repository that 
could significantly ease the administrative burden. 
In addition, the future directory service will also pro- 
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vide end users with a rich information data ware- 
house that allows them to access department or com- 
pany employee data, or resource information such 
as the name and location of printers, copy machines, 
etc. In the Internet or intranet environment, users 
will be able to use the public key certificates stored 
in the directory to handle encrypted or digitally 
signed documents. 

Lightweight Directory Access Protocol (ldap) is an 
emerging technology that can provide directory ser- 
vices to applications ranging from e-mail systems to 
distributed system management tools. LDAP is a sim- 
ple directory data access protocol that supports a rich 
set of operations for a wide range of applications. 
LDAP is an open Internet standard, defined by the 
Internet Engineering Task Force (ietf). A number 
of implementations of LDAP are available, ranging 
from commercial to publicly available open-source 
products. 

One key feature of LDAP is it is simple but function- 
ally rich. The simplicity of LDAP enables users to store 
and retrieve data easily from the directory. Never- 
theless, as the use of directory services becomes more 
widespread, directories will need to scale to support 
millions of entries and millions of user requests with 
subsecond predictable performance. The protocol 
(LDAP) and the application programming interfaces 
(APIs) to access the directory are simple, but the 
amount of data stored in the directory can be tre- 
mendous. There must be a natural growth path for 
scale and performance that does not require an all- 
out replacement of the currently installed directory. 

LDAP directories can be implemented using various 
storage mechanisms such as flat files, b-trees, or da- 
tabases. The data store is as important as (if not more 
than) the protocol implementation of the directory 
server. It must scale, provide transactional integrity, 
be robust, and handle a variety of queries securely 
from simple to negation, existence, and wild cards — 
all without paying a noticeable performance penalty. 

This paper discusses an implementation of LDAP that 
uses the IBM database 2* (DB2*) relational data- 
base as the data store and query engine to meet the 
directory service requirements and demonstrates 
performance via a set of benchmarks that catego- 
rize the types of queries. In detail, this paper exam- 
ines the salient points of the University of Michigan 
reference implementation, the trade-offs of using a 
relational database, the LDAP hierarchical informa- 
tion model through a set of relations (tables), the 



mapping of the LDAP query language to SQL (Struc- 
tured Query Language) given the defined relations, 
the access control model to provide authorization 
in the LDAP directory service, an identified set of 
items implemented to further enhance performance, 
and finally the performance comparison of a rela- 
tional implementation to a file-system-based imple- 
mentation given a set of well-defined benchmarks. 
This paper is an enhanced version of an earlier con- 
ference paper. 2 

Summary of related work 

LDAP was originally implemented by the University 
of Michigan (U of M). The U of M reference im- 
plementation 3 is freely available through their FTP 
site. The implementation of the U of M LDAP is based 
on several freely available b-tree packages, such as 
gnu dbm and Berkeley db packages. This reference 
implementation supports LDAP version 2 protocol 
and is used as a basis for the LDAP/DB2 directory. The 
ldap protocol and APIs are defined in ietf RFCs (Re- 
quests for Comments) 1777-1779 and 1823. 4 7 

LDAP was coauthored by Timothy A. Howes and 
Mark C. Smith, 8 and an implementation was done 
at the University of Michigan. The U of M LDAP is 
a sound reference implementation that helps peo- 
ple to understand the internals of LDAP. However, 
it needs a lot of enhancements to be a reliable and 
scalable enterprise directory service. First, the num- 
ber of entries is limited. It does not scale to more 
than a few hundred thousand to possibly a million 
entries. By using simple file-system-based hash and 
b-tree packages, it is not able to handle large amounts 
of data. In contrast, relational database technolo- 
gies such as DB2 9 are designed to handle terabytes 
of data. Second, populating directories with large 
numbers of entries is time-consuming work. A great 
amount of time is required to populate the direc- 
tories with millions of entries. Third, because of lim- 
ited search and indexing facilities provided by the 
file -system-based back end, only candidate entries 
can be retrieved. Then each entry is filtered through 
the filter program before it is returned to the client. 
However, in some cases, when the set of candidates 
is large, the search degenerates into a sequential 
search. For example, we discovered that negation 
queries and existence queries are fairly expensive 
with both the reference implementation and LDAP 
server from Netscape Communications C orporation. 
By using the powerful search engines provided by 
DB2, we were able to address some of the weak 
areas. 
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Mapping the LDAP model 110 ' 11 to relational tables, 
however, is not a trivial task. First, LDAP allows both 
single-valued and multivalued attributes. But a re- 
lational database does not deal with multivalued at- 
tributes well. Second, the size of each DB2 table 9 is 
limited to 4K. Third, the LDAP model is hierarchi- 
cal. It is known that hierarchies are very easy to rep- 
resent in hierarchical databases (such as the Infor- 
mation Management System, or IMS*), where the 
structure of the data and the structure of the data- 
base are the same. Unfortunately, it is generally 
thought that relational databases do not provide ad- 
equate support for such data. It is not possible to 
directly map hierarchical data into tables because 
tables are based on sets rather than on graphs. Dif- 
ferent vendors provide different mechanisms for the 
tree structure. For example, DB2 9 provides the with 
clauses in the SELECT statement to provide sub- 
tree traversal with arbitrary depth. Oracle 12 has 
CONNECT BY PRIOR and START WITH clauses in the 
SELECT statement to provide partial support for 
reachability and path enumeration. But all mecha- 
nisms will end up with recursive queries to handle 
hierarchical structures. Through experimentation, we 
discovered that recursive queries do not scale up well 
for large numbers of records in the table. We did a 
small experiment with 1000 LDAP entries using DB2 
recursive queries; a simple select takes more than 
five minutes to complete. 

U.S. Patent 5467471 13 presents a solution that does 
not require recursive query. The invention provides 
a genealogy table with which the directory hierar- 
chy is represented in a table form. Each column of 
the genealogy table represents a level of the direc- 
tory tree. This solution might be fine for directories 
with limited hierarchy depth. However, it is very dif- 
ficult to realize the idea in practice when the direc- 
tory is infinitely deep and the number of columns of 
a table is limited. We have attempted a similar im- 
plementation but learned that the complexity is very 
great and the performance implication is unclear. 

To address this problem, we invented an efficient 
method 1415 to represent LDAP hierarchies with re- 
lational tables without the overhead of recursive que- 
ries. 1620 Our performance results showed that our 
implementation is competitive with other directory 
products in the industry. 

Trade-offs of using DB2 

This section is a summary of the advantages of using 
DB2 as the LDAP data store. 



One advantage is being a highly scalable data store. 
Unlike b-tree libraries in which the application pro- 
cess is tightly coupled with the physical storage, DB2 
offers a number of solutions to store and retrieve 
large amounts of data. First of all, DB2 provides very 
flexible data placement options based on table 
spaces. A DB2 database can have multiple table 
spaces, and each table space can be created, mod- 
ified, and recovered independently of another. Ta- 
bles can be created in a preallocated storage (within 
a table space), or the storage can be allocated only 
when it is required or when the data are populated. 
The table data can be entirely contained in a single 
table space or multiple table spaces. DB2 also incor- 
porates the concept of containers. A container is the 
allocation of physical spaces. In ATX* (Advanced In- 
teractive Executive), a container is a logical volume. 
With the layer of table space, DB2 is able to handle 
large amounts of data independent of the limit of 
physical data storage (hard drive) and logical data 
storage (file system). For example, the maximum ca- 
pacity of an AIX 4.1.x file system is 2 gigabytes. With 
table space, DB2 allows a database to grow up to 60 
terabytes of data on a uniprocessor machine. Sec- 
ond, DB2 provides a truly distributed data store so- 
lution. Various configurations can be arranged with 
DB2. The configurations can be classified into the fol- 
lowing categories: 

• Single client/multiple server configuration. In a 
client/server configuration, every database resides 
on one network node, a database server, and is 
managed by a database management system 
(DBMS) running on that node. The applications can 
run on network nodes, such as the client nodes, 
separated from servers. The DB2 client provides 
the communication mechanism to retrieve data 
from the DB2 server. In the LDAP environment, the 
LDAP server (a database client) can connect to a 
number of networked databases that contain the 
directory information. However, from the user's 
perspective, the ldap server actually stores all the 
information without knowing in which database the 
data are actually located. With this configuration, 
the LDAP server is freed from managing the phys- 
ical data storage and is able to retrieve informa- 
tion from multiple database servers that work to- 
gether to form a huge data store. 

• Multiple clients/multiple servers configuration. 
The database clients can connect to any database 
server that contains the directory information. The 
collection of database servers actually form a sin- 
gle directory system image, and more than one 
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Figure 1 Multiple clients/multiple servers configuration 




LDAP server can access the directory information. 
Since all the LDAP servers see the same directory 
image, a network dispatcher can be deployed to 
route requests among the LDAP servers. Figure 1 
illustrates the configuration. 

Multiple clients/parallel super server configuration. 
In certain environments where users need to store 
large amounts of information in the directory, 
DB2 PE (Parallel Edition) provides a solution to en- 
able users to store huge amounts of information 
in a single database. Instead of partitioning the da- 
tabase from the application level, DB2 PE automat- 
ically partitions the database into different ma- 
chines. In addition, DB2 PE divides database queries 
into smaller, independent tasks that can execute 
concurrently; it is possible to complete the query 
fast enough to meet an end user's response time 
needs. LDAP can benefit from the power of par- 
allel processing of DB2 PE with no changes to the 
server implementation. Another advantage of the 
parallel database solution is the ability to expand 



the system incrementally. This means users can 
have a startup system with smaller, lower-cost con- 
figurations to match initial size requirements. 
Then, as the database grows, users can easily add 
appropriately sized resources to accommodate 
growth. DB2 pe enables users to store terabytes of 
data into a database. Figure 1 illustrates the con- 
figuration. 

A second advantage is being an atomic transaction. 
With the atomic transaction supported by DB2, LDAP 
servers can survive hardware or software failures and 
still maintain the integrity of the directory informa- 
tion. One of the problems with b-tree packages is 
that there are occasions where the database will be- 
come corrupted because hardware or software fail- 
ures occur while updates are in progress. DB2 pro- 
vides atomic transaction to ensure that updates of 
committed transactions will be recorded in the da- 
tabase. Uncommitted transactions, however, will be 
rolled back when system failure occurs. Database in- 
tegrity will be maintained under all circumstances. 
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Figure 2 Possible syntaxes and meanings 



bin 


binary 


dn: 


distinguished name 


ces 


case exact string 


ci s 


case ignore string 


MONO tel 


telephone number string (like cis but blanks and dashes are 




ignored during comparisons) 


dn: 


distinguished name 



As a third advantage, it is an on-line backup and re- 
store facility. DB2 allows users to back up databases 
while it is either on line or off line. If the backup is 
performed off line, only the backup task can be con- 
nected to the database. If the backup is performed 
on line, other applications will be able to connect to 
the database while the backup task is running. The 
database can be local or remote. Users can back up 
a database or table space to disk, table, or a location 
managed by a utility such as the IBM adsm (adstar 
Distributed Storage Manager). A graphical tool is 
available (db2jobs) for users to monitor backup and 
recovery. DB2 provides a very fast and efficient re- 
store facility. A 100K entry LDAP directory can be 
restored in less than 10 minutes. 

As a fourth advantage, it provides alternative rep- 
lication support. In addition to the replication sup- 
ported by ldap, the user can use the DB2 data rep- 
licator to provide faster and more efficient replication 
for frequent updates. 

As a fifth advantage, it is a fast database loading fa- 
cility. LDAP/DB2 provides a fast facility (bulkload) for 
initial load or for updating tables with large amounts 
of data. This facility is based on the DB2 LOAD util- 
ity. DB2 can move data into tables, create an index, 
and generate database statistics. The bulkload tool 
can populate the directory with 100K entries twice 
as fast as our nearest competitor. 

The sixth advantage is having a powerful query pro- 
cessing engine that can deal with complicated que- 
ries. As is well-known, SQL is a very powerful query 
language. One of the problems that we observed in 
the U of M reference implementation is that it is not 
able to handle queries with the negation operator 



(!). The query usually degenerates into a linear 
search. LDAP/DB2, in contrast, is able to translate all 
LDAP queries into SQL and obtains timely and ac- 
curate results. 

To summarize, DB2 provides hard-won advantages 
such as scalability, transaction integrity, backup and 
recovery, stability, and a powerful query processing 
engine. But all of the features mentioned above come 
with costs. Before our implementation, we had con- 
cerns about how the extra path length added by DB2 
would affect the response time for LDAP queries. Af- 
ter extensive performance measurements, we discov- 
ered that LDAP/DB2 performs well compared to other 
LDAP products in the industry. 

LDAP and relational model 

In this section, we describe the information model 
and the relational aspects of the model for LDAP. 

LDAP information model. The LDAP directory da- 
tabase consists of entries. Each entry is composed 
of one or more attributes. A type is associated with 
each attribute, and an attribute can have more than 
one value in an entry. The attribute type determines 
the syntax of the attribute. The syntax of an attribute 
determines how the data will be compared against 
the values in the query. In LDAP v2, 4 721 the possible 
syntaxes and their meanings are listed in Figure 2. 

The LDAP schema is a collection of attribute defi- 
nitions, object class definitions, and other informa- 
tion that a server uses to determine how and what 
to return for a given request. An attribute is infor- 
mation of a particular type and may have one or more 
associated values. A set of attributes is called an ob- 
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ject class. A directory entry is an instantiation of one 
or more object classes, that is, a set of attributes. At- 
tributes in a given object class are specified as man- 
datory or optional. Mandatory attributes are re- 
quired to have values in the directory entry. Optional 
attributes, in contrast, do not have to exist for the 
directory entry of an object class. Figure 3 illustrates 
a possible definition of an object class called Per- 
son. 

In object class Person, cn (common name), sn (sur- 
name), and objectclass are mandatory attributes; 
mail, phone, address, and fax are optional attributes. 

LDAP entries are arranged in a tree structure that 
typically follows a geographical and organizational 
structure but is not limited to such structures. Each 
entry is uniquely identilied by a distinguished name 
(dn). The formal definition of a distinguished name 
is given in RFC 1779." 

The functions provided by LDAP can be categorized 



» Query: search and compare. These operations are 
used to retrieve information from the database. 
For the search function, the criteria of the search 
is specified in the search filter. The search filter is 
a Boolean expression that consists of attribute 
names, attribute values, and the Boolean opera- 
tors AND represented by &, or represented by |, 
and not represented by !. Users can use the filter 
to perform fairly complicated search operations. 
The filter syntax is defined in RFC I960. 22 

In addition to the search filter, users can also spec- 
ify where the search starts in the directory tree 
structure. The starting point is called the base DN. 
The scope of the search can be a single entry (base- 
level search), the children of an entry (one-level 
search), or an entire subtree (subtree search). Al- 
though LDAP does not provide separate read and 
list operations, the search operation is used to pro- 
vide these operations by setting the DN, scope, and 
filter appropriately. 

» Update: add, delete, and modify. Users can use 
these functions to update the contents of the di- 
rectory. 

» Authentication: bind and unbind. LDAP supports 
a simple ID (identifier) and password authentica- 
tion scheme. In the bind operation, the user can 
specify the ID and password, and the server will 



Figure 3 Possible definition of object class Person 



objectClass Person 
required cn.sn.objectCl ass 
allows mall, phone, address, fax 



use this information to authenticate the client. If 
successful, the authentication is in effect for the 
life of that LDAP session, that is, until the corre- 
sponding unbind operation is issued. 

• Rename: ModRDN. MoDRDN renames an existing 
directory entry. 

Model LDAP through relations. At first glance, it 
seems very obvious that we should be mapping an 
LDAP object class into a DB2 relation. However, this 
mapping posed a serious problem since the LDAP 
model allows both single-valued and multivalued at- 
tributes. In the database design guideline, the First 
Normal Form requires that attributes within each 
tuple are ordered and complete and that the domains 
permit only simple values. Simple values cannot be 
decomposed into multiple values and cannot them- 
selves be sets or relations. Some database systems 
(such as DB2) are attempting to support multivalued 
attributes. However, the implementation is not avail- 
able yet. Unnormalized relations will make update 
operations (e.g., add, modify, and delete) fairly dif- 
ficult to manage. We also discovered that we might 
lose some data semantics during the update process 
when multivalued attributes exist. 

Instead, we mapped each LDAP attribute that can be 
searched by the user to an attribute relation. The 
attribute definitions were obtained from the attribute 
configuration files. This relation consists of two col- 
umns: unique entry identifier (eid) and normalized 
attribute value. In our system, each LDAP entry is as- 
signed an EID. On the basis of the attribute syntax, 
the attributes are converted (or normalized) so that 
our system can apply SQL queries to the attribute val- 
ues. For example, if the attribute syntax is case in- 
sensitive (CIS), the attribute value will be converted 
to all uppercase and stored in the attribute table. The 
attribute table is used mainly for the search oper- 
ation to find the entries that match the filter crite- 
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Figure 4 LDAP naming hierarchy 
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ria. The actual entry data are stored in the ldap_en- 
try table. In other words, the SQL queries generated 
by our system use the attribute table to locate the 
entry ElDs that match the filter expression and use 
the ElDs to retrieve the entry data from the ldap_en- 
try table. Another advantage of this per- attribute ta- 
ble is that the size of the entry is no longer bounded 
by the DB2 4K limit. The attribute table and 
ldap_entry table are similar to the id2entry and at- 
tribute indices in the U of M reference implemen- 
tation. The main difference is that our implemen- 
tation is able to retrieve the exact target entries 
instead of just "'candidates." As a result, no postpro- 
cessing of filtering entries is needed in LDAP/DB2. 

A second challenge is to map LDAP to relational ta- 
bles because the LDAP model is hierarchical. We dis- 
covered that with simple relations we were able to 
support ldap search (base, one level, and subtree) 
with decent performance. The next subsection dis- 
cusses the details. 

In addition, we also have an LDAP entry table that 
holds the information about an LDAP entry. This table 
is used for obtaining the EID of the entry and support- 
ing LDAPSCOPEONELEVEL and LDAP SCOPE BASE 

search scope. Entries are stored using a simple text 
format of the form "attribute: value" as in the U of 
M reference implementation. Non-ASCII values or 
values that are too long to fit on a reasonably sized 
line are represented using a base 64 encoding. Given 
an ID, the corresponding entry can be returned with 
a single SELECT statement. 



Mapping the LDAP hierarchy through relations. As 

illustrated in Figure 4, the LDAP naming hierarchy 
includes a number of entries where each entry is rep- 
resented by a unique entry identifier (eid). Thus, for 
example, the root node has an EID = 1. Root has 
two children, entry GB ("Great Britain") with EID = 

2, and entry us ("United States") with eid = 3. Child 
node us itself has two children, o = IBM (with EID 
= 4) and o = Netscape (with eid = 5). The remain- 
der of the naming directory includes several addi- 
tional entries at further sublevels. 

A particular entry thus may be a "parent" of one or 
more child entries. An entry is considered a "par- 
ent" if it is located in the next higher level in the hi- 
erarchy. Likewise, a particular entry may be an an- 
cestor of one or more descendant entries across many 
different levels of the hierarchy. A parent-child pair 
will also present an ancestor-descendant pair. 

We created two relations to model the hierarchy in 
ldap: parent-child (parent table) and ancestor-de- 
scendant (ancestor table). The parent table is cre- 
ated as follows. For each entry that is a parent of a 
child entry in the naming hierarchy, the unique iden- 
tifier of the parent entry (peid) is associated with 
the unique identifier of each entry that is a child of 
that parent entry. In the LDAP hierarchy illustrated 
in Figure 4, PEID 1 is associated with EID 2 and EID 

3, PEID 3 is associated with EID 4 and EID 5, and so 
on. Each row of the parent table includes a PEID-EID 
pair. 
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Figure 5 One-level SQL query skeleton 



One-Level Search: 

SELECT <data fields> 

from ldap_entry as entry 

where entry. EID in ( 

select distinct ldap_entry.EID 

from ldap_entry as pchild, <tab1e list> 

where ldap_entry. EID=pchi Id . EID AND pchild. PIED=<root dn id> 
AND ldap_entry.EID IN <sql select statements >) 



The descendant table is created as follows. For each 
entry that is an ancestor of one or more descendant 
entries in the hierarchy, the unique identifier of the 
ancestor entry (aeid) is associated with the unique 
identifier of each entry that is the descendant (deid) 
of that ancestor entry. The AEID field is the unique 
identifier of an ancestor LDAP entry in the LDAP nam- 
ing hierarchy. The DEID field is the unique identi- 
fier of the descendant ldap entry. Thus, in the nam- 
ing hierarchy illustrated in Figure 4, AEID 1 has DElDs 
2-1 3 because each of the entries 2-13 is also a de- 
scendant of the root node, aeid 3 has DElDs 4-13, 
AEID 4 has DElDs 6-13, and so on. Each row in the 
descendant table thus includes an AEID-DEID pair. 

For the LDAP search operation, the criteria of the 
search are specified in a search filter. The search fil- 
ter is typically a Boolean expression that consists of 
attribute name, attribute value, and the Boolean op- 
erators and, or, and not. Users can use the filter 
to perform complex search operations. The filter syn- 
tax is defined in RFC 1960. 22 In addition to the search 
filter, users can also specify where in the directory 
tree structure the search is to start. The starting point 
is called the base DN. The search can be applied to 
a single entry (a base-level search), the children of 
an entry (a one-level search), or an entire subtree 
(a subtree search). Thus, the "scope" supported by 
an LDAP search consists of base, one level, and sub- 
tree. The parent and ancestor tables are used to fa- 
cilitate one-level and subtree searches without re- 
cursive queries. In both cases, the search begins by 
going into the database and using the LDAP filter cri- 
teria to retrieve a list of entries matching the filter 
criteria. If the search is a one-level search, the par- 



ent table is then used to filter out eids that are out- 
side the search scope (based on the starting point 
or base dn). Tikewise, if the search is a subtree 
search, the descendant table is then used to filter out 
ElDs that are outside the search scope (again, based 
on the base dn). However, all the steps mentioned 
above are performed in a single SQL query. Figures 
5 and 6 are some examples of the SQL query skel- 
eton that we used during the one-level and subtree 
search. In these examples, <data fields> represents 
the SQL column name of the relations defined in the 
LDAP/DB2 schema that is described in more detail in 
the next subsection. <table list> and <sql select 
statements> are the two null terminated strings re- 
turned by the filter translator. The details of how the 
inner <sql select statements> were generated are 
described in the next section. <root dn id> is the 
unique identifier of the root DN. 

In the one-level search query, the parent table in- 
formation is contained in the ldap_entry table. Since 
the ldap_entry table also contains the entry informa- 
tion, we use SQL as operator to provide an alias pchild 
to represent the parent and child relation. Then, in the 
where clause, "ldap_entry.ElD=pchild.ElD" is used 
to filter out entries that are not in the one-level search 
scope. 

In the subtree search query, the ancestor informa- 
tion is stored in the ldap_desc table. The inner where 
statement "ldap_entry.ElD=ldap_desc.DElD" is 
used to filter out entries that are not in the subtree 
search scope. 
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Figure 6 Subtree SQL query skeleton 



Subtree Search: 

SELECT <data fields> 

from 1dap_entry as entry 

where entry. EID in ( 

select distinct 1 dap_entry .EID 

from ldap_entry, 1dap_desc, <table list> 

where (ldap_entry.EID=1dap_desc.DEID AND 1dap_desc.AEID=<root dn id>) 
AND ldap_entry.EID IN <sq1 select statements >) 



Database schema. We now give a detailed explana- 
tion of the tables that we defined in LDAP/DB2. 

Entry table. The entry table holds the information 
about an LDAP entry. This table is used to obtain the EID 
of the entry and to support ldapscopeonelevel 
and LDAP SCOPE BASE search scope. The parent and 
child table is included in the entry table since all the 
other attributes are dependent on EID. The columns 
in this table are: 

• EID — The unique identifier of the LDAP entry. This 
field is indexed. 

• PEID — The unique identifier of the parent LDAP 
entry in the naming hierarchy. For example, the 
LDAP entry with the name "ou= Information Di- 
vision, ou= People, o= University of Michigan, 
c=us" is the parent of "cn=Barbara Jensen, 
ou= Information Division, ou= People, o= Uni- 
versity of Michigan, c=us." 

• DN — The distinguished name of the entry. 

• DN_TRUNC — Truncate dn to 250 characters so that 
we can build indices on this field. 

• EntryData — Entries are stored using a simple text 
format of the form "attribute: value" as in the U 
of M reference implementation. Non-ASCII values 
or values that are too long to fit on a reasonably 
sized line are represented using a base 64 encod- 
ing. Given an ID, the corresponding entry can be 
returned with a single SELECT statement. 

• Creator — The DN of the entry creator. 

• Modifier — The DN of the entry modifier. 

• modify_timestamp — Records the time when the 
entry was last modified. 



• createjimestamp — Records the time when the en- 
try was created. 

Attribute table. There is one attribute table per search- 
able attribute. Each LDAP entry is assigned a unique 
identifier (eid) by the backing store. The columns 
for this table are: 

• EID — The unique identifier of the LDAP entry. 

• Attribute value — Normalized attribute values. 

• Truncated attribute value — If the length of the col- 
umn is longer than 250 bytes, a truncated column 
is created for indexing. In DB2, the maximum length 
for an indexed column is 255 bytes. The SQL type 
of the attribute depends on the ldap data type. 
Indices can be created for attributes whose size is 
less than 255 bytes. 

Descendant table. The purpose of the descendant ta- 
ble is to support the subtree search feature of ldap. 
For each LDAP entry with a unique ID (aeid), this 
table contains the unique identifiers (deid) of the 
descendant entries. The columns in this table are: 

• AEID — The unique identifier of the ancestor LDAP 
entry. This entry is indexed. 

• DEID — The unique identifier of the descendant 
LDAP entry. This entry is indexed. 

For every entry in the directory, a row exists in this 
table for each of its ancestors including itself. The 
size of the table depends on the depth of each entry. 
In the worst case, if all the entries were at the same 
depth, the number of rows in the table is in O(nm), 
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where n is the number of nodes in the directory and 
m is the depth of the tree. 

LDAP filter to SQL translation 

This section discusses how LDAP/DB2 translates LDAP 
filters 22 into various types of SQL queries. We im- 
plemented a filter translator to generate the equiv- 
alent SQL expression corresponding to an LDAP fil- 
ter that can be used in the WHERE clause of an 
SQL SELECT statement. For all queries, the general 
approach is to obtain the entry ElDs that match the 
search criteria based on the filter from the attribute 
table. Then the parent or ancestor tables are used 
to check whether the ElDs are located in the subtree 
under the base DN. After obtaining the entry ElDs 
that satisfy the filter and search scope criteria, the 
entry data are retrieved from the LDAP entry table. 
However, all the operations mentioned above are 
performed in a single SQL query. We discovered that 
combining subqueries into a single query is much 
more efficient than performing subqueries indepen- 
dently. The combined SQL query not only saves con- 
text switching cost, but also provides the DB2 query 
optimizer with more information to come up with 
an optimum access plan. 

LDAP filters consist of six basic search filters with the 
format <attribute> <operator> <value>. Com- 
plex search filters can be generated by combining ba- 
sic filters with the Boolean operators AND, OR, and 

NOP. 

The skeleton of SQL select statements used by 
LDAP/DB2 search routines are illustrated in Figure 
7. 

On the basis of the filter received, our SQL transla- 
tor will generate <table list> (a list of attribute ta- 
bles) and <sql select statements>. <root dn id> is 
the unique identifier of the root DN. The translation 
rules for basic filters and Boolean filters are pre- 
sented in the following subsections. In the transla- 
tion rules, tablename is the SQL table for the spec- 
ified attribute, and columnname is the column name 
containing the attribute values. 

Equality. The equality search operator locates en- 
tries with attributes exactly equal to the given value. 
The translation rule is shown in Figure 8. 

For example, the purpose of the filter (sn = Jensen) 
is to find surnames exactly equal to Jensen. The cor- 



responding SQL subquery generated for this filter is 

(select eid from sn where sn = 'jensen '). 

Ranges. For attributes supporting ordering, the LDAP 
filter provides inequality operators such as "greater 
than or equal" and "less than or equal." The trans- 
lation rules are given in Figure 9. 

For example, the LDAP filter (sn > = Jensen) locates 
entries with surnames lexicographically greater or 
equal to Jensen. The corresponding SQL subquery 
generated for this filter is (SELECT EID FROM sn 
where sn = 'jensen'). 

Substring. LDAP supports arbitrary substring match- 
ing for text attributes. The user can put the wild-card 
character (*) at the beginning of a string, the middle 
of the string, the end of the string, or any combina- 
tion of these in the LDAP filter. The format of the 
substring filter is: 

(<attr> = [<leading>]* [any]*[<trailing>]) 

The SQL operator LIKE is used for substring match- 
ing. The SQL LIKE operator has the following syntax: 

column like pattern 

PATTERN combines string constants with wild-card 
characters. SQL recognizes two wild-card characters: 
(f ) the percent symbol (%), which means match zero 
or more characters, and (2) the underscore symbol 
(_), which means match any one character. 

We use the SQL wild-card character "%" for the LDAP 
wild-card character. The LDAP substring filter 
"(attribute=value-with-stars)" is translated into 
"(SELECT EID FROM tablename WHERE columnname 
LIKE 'value with percents ')." 

For example, the LDAP filter (sn = *jensen*) locates 
surnames containing the string "jensen." The follow- 
ing SQL query is generated: 

(SELECT EID FROM sn 

where sn like ' %jensen% ' ) 

Approximate. The approximate search operator lo- 
cates entries with attributes that sound like the given 
attribute value. The format of the approximate 
search filter is (<attr>~ = <value>). 
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Figure 7 Skeleton of SQL SELECT statements 



Base-Level Search: 

SELECT entry. EntryData, operational attr1butes> 
from 1dap_entry as entry 
where entry. EID in ( 

select distinct ldap_entry.EID 
from <table list> 

where (ldap_entry.EID=<root dn id> ) 

AND 1dap_entry . EID IN <sql select statements >) 

One-Level Search: 

SELECT entry. EntryData, operational attributes> 
from ldap_entry as entry 
where entry. EID in ( 

select distinct 1 dap_entry . EID 

from ldap_entry as pchild, <table list> 

where 1 dap_entry . EID=pchi 1 d . EID AND pchild. PIED=<root dn 1d> 
AND 1dap_entry . EID IN <sql select statements > ) 

Subtree Search: 

SELECT entry . EntryData , operational attributes> 
from ldap_entry as entry 
where entry. EID in ( 

select distinct ldap_entry.EID 

from ldap_entry, 1dap_desc, <table list> 

where (ldap_entry.EID=1dap_desc.DEID AND ldap_desc.AEID=<root dn 
AND ldap_entry.EID IN <sq1 select statements > ) 



Figure 8 Translation rule for equality operator 



LDAP filter: 

(<attr> = <value>) 
SQL expression: 

(SELECT EID FROM tablename WHERE columnname = 'value') 



The DB2 SOUNDEX library function is used for an ap- 
proximate search. The soundex function returns 
a four-character string that is either a CHAR or 
varchar. The soundex function is useful for find- 



ing strings for which the sound is known but the pre- 
cise spelling is not. It makes assumptions about the 
way that letters and combinations of letters sound 
that can help to search for words with similar sounds. 
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Figure 9 Translation rules for inequality operators 



LDAP filter: 






(<attr> >= <value>) 
(<attr> <= <value>) 






SQL expression: 






(SELECT EI D FROM tablename WHERE 
(SELECT EID FROM tablename WHERE 


col umnname >= ' val u 
col umnname <= ' val u 


e') 
e'} 



Figure 1 0 Translation of LDAP search filter 




SELECT EID FROM tablename WHERE SOUNDEX(col umnname) 
=S0UNDEX( 'value'))) 



Figure 1 1 Example of SQL query 



SELECT EID FROM sn WHERE SOUNDEX(sn) 
=S0UNDEX( 'Jensen' )))) 



The LDAP search filter "(attribute— =value)" is trans- 
lated to the SQL query in Figure 10. For example, 
the LDAP filter (sn~ =jensen) locates entries with sur- 
names that sound like "jensen." The SQL query gen- 
erated is illustrated in Figure 11. 

The basic LDAP hlter can be combined to form more 
complicated filters using the Boolean operators — AND 
(&), OR (|), and NOT (!) — and a prefix notation. 

Others. The attribute values specified in the ldap 
filters cannot contain unescaped left or right paren- 



thesis characters. Escape combinations (backslash 
followed by any character) are translated as indicated 
in Figure 12. 

Any single-quote characters found in the attribute 
value will be translated to two single-quote charac- 
ters since the SQL value is enclosed in single-quote 
characters. 

Complex queries. On the basis of the basic translation 
rules mentioned above, our biggest challenges are to 
provide an algorithm that can do the following: 



IBM SYSTEMS JOURNAL, VOL 39, NO 2, 2000 



SHI ET AL. 371 



Figure 1 2 Translation of escape combinations 



\) will translate to ) in the SQL value 
\( will translate to ( in the SQL value 
\* will translate to * in the SQL value 
\\ will translate to \ in the SQL value 

\c will translate to \c in the SQL value where c is any other 
characters other than ) or ( or * or \ 



Figure 1 3 Example of intuitive solution 



LDAP filter: 

Idap filter ( I (f 1=' vl ' ) ( f2=' v2 ' ) ) 



SELECT EntryData 

FROM ldap_entry, fl,f2 

WHERE (fl.fl^flvalue 1 ) OR Cf2.f2='f£ 

AND (ldapentry.EID=fl.UID) 

AND (ldapentry.EID=f2.UID) 

AND (Idapentry.EID IN 

SELECT DEID from ldapdesc 

WHERE PEID=<UID>) 



• Combine the basic expressions to form a single SQL 
query that will retrieve the target entries that ex- 
actly match the search criteria 

• Deal with complicated LDAP queries with infinite 
logical depth 

• Deal with all logical operators efficiently 

An intuitive solution is based on joining the attribute 
tables and applying the basic expressions to the at- 
tributes in the joined table. The LDAP AND and OR 
operators, in this case, can be translated into SQL AND 
and OR directly. In addition to the combined SQL 
expression, we need to include the JOIN condition 
based on EID. Figure 13 contains an example. 

However, it is difficult to generalize this solution to 
handle the not operator. The LDAP NOT operator 



is basically used to locate entries that do not match 
the search criteria. A naive solution is to directly 
translate the LDAP NOT operator into an SQL NOT 
operator. A sample SQL query is illustrated in Fig- 
ure 14. 

The SQL query in Figure 14 does not yield the cor- 
rect answer. Figure 15 illustrates the problem. There 
are five entries in this sample database (EID from 1 
through 5) where the values of attribute fl for entry 
1, 2, 3, and 4 are vl. However, fl of entry 4 is a mul- 
tivalued attribute which has value vl and foo. With 
the SQL statement in Figure 14, entry 4 is the an- 
swer. However, the correct answer should be entry 
5. We discovered the following problems with the 
table join approach: 
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Figure 1 4 Sample SQL query for NOT operator 



LDAP filter: 

Idap filter (!(fl='vl')) 



SELECT EntryData 

FROM 1dap_entry, fl,f2 

WHERE NOT (fl.fl='vl') 

AND (ldap_entry.EID=fl.UID) 
AND (ldap_entry.EID IN 
SELECT DEI D from ldapdesc 
WHERE PEID=<UID>) 



• If an entry does not contain the target attributes 
(for example, entry 5), this entry should not be se- 
lected by the SQL statement. 

• Since an attribute can have multiple values in LDAP, 
the table join query will select the entry in which 
one of the values meets the criteria (for example, 
entry 4). But this entry should not be selected based 
on the LDAP filter. 

One solution to these problems is to retrieve all the 
entries from the database and filter out the candi- 
date entry as done by the Netscape server. However, 
for an LDAP directory with a large number of entries, 
it takes a very long time to obtain the results. 

Another problem that we found with the intuitive 
solution mentioned above is that the OR operation 
does not perform well even for a small database with 
thousands of entries. Because it is using join to com- 
bine the attribute tables and ldap_entry tables, DB2 
will take a cross-product of all the rows in the at- 
tribute tables and the ldap_entry table for the OR 
operation. Even though most of the rows in the cross- 
product are irrelevant, the DB2 SQL engine dutifully 
reports all these rows, which are generally much more 
than are needed for the subquery evaluation. 

Our solution is based on the concept of EID sets. First, 
generate an SQL subquery for each LDAP operator 
based on the basic translation rules. The SQL sub- 
query will generate a set of entry ElDs that match 
the LDAP basic operation. If the LDAP logical oper- 
ator is OR, use UNION to unite the sets generated 
from the subquery. If the LDAP operator is AND, use 



Figure 15 Problem with SQL query 



i:nti!y ; i : : : : m\W9 ■ 




intercept to intercept the sets generated from the 
subquery. We experimented with two different ways 
to put together the SQL query based on the eid set 
concept. 

Figure 16 illustrates SQL queries that our system can 
generate for the LDAP filter (|(fl= 'vl ') (f2= V2')). 

Both SQL statements in Figure 16 generate the cor- 
rect results. The first query is basically to perform 
the JOIN operation with the LDAP descendant table 
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Figure 1 6 SQL queries generated by system 



Alternative I: 

SELECT entry. EntryData 
FROM ldap_entry as entry 
WHERE entry. EID in 

{ 

SELECT distinct 1 dap_entry . EID 
FROM ldap_entry, ldap_desc, fl 
WHERE 

(1dap_entry .EID=ldap_desc.DEID AND 
ldap_desc_AID=<id>) AND 
ldap_entry.eid=fl.eid AND 

UNION 

SELECT distinct I dap_entry . EID 
FROM ldap_entry, ldap_desc, f2 
WHERE (ldap_entry.EID=ldap+desc.DEID AND 

ldap_desc.AEID=<id>) 

AND ldap_entry.EID=f2.eid 

AND f2='v2' )) 

Alternative 2: 

SELECT entry. EntryData 

FROM 1dap_entry as entry WHERE entry. EID in 

{ SELECT distinct 1 dap_entry . EID FROM ldap_entry,ldap_desc 

WHERE 

(ldap_entry.EID=ldap_desc.DEID AND 1dap_desc.AEID=<id>) 
AND ldapentry.EID 

IN ((SELECT EID FROM fl WHERE fl = 'vD 
UNION (SELECT EID FROM SN WHERE SN ='v2" ))) 



within each subquery. The second query is to per- 
form the JOIN operation with the LDAP descendant 
table outside the subquery. Through extensive mea- 
surement, we chose to use Alternative 2 on the basis 
of the performance results. In addition to correct re- 
sults, the OR operation performs reasonably well with 
both Alternatives 1 and 2 since relevant entries will 
be filtered out in the subquery, and target entries 
will be reported back to the main query. 

With the set-based approach, the NOT operation can 
be performed by excluding entries with negation of 
the IN operation before the subquery. Figure 17 il- 
lustrates the operation. 



With the basic translation rules and the EID sets ap- 
proach, we implemented a recursive algorithm that 
can deal with complicated queries that have infinite 
logical operators. Figure 18 is an example of an SQL 
statement generated for a complex query with the 
AND, OR, and NOT operators. 

Directory access control 

Once information is stored in the directory, a method 
of protection must also be provided. When a user 
performs an operation on a directory entry, some 
mechanism must grant or deny permission to com- 
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Figure 1 7 NOT operation 



Filter String: 
(!(fl='vD) 

SQL Statement: 

SELECT entry . EntryData , 

FROM ldap_entry as entry 

WHERE entry. EID in 

( SELECT distinct LDAP_ENTRY . EID 

FROM ldap_entry ,ldap_desc 

WHERE 

(ldap_entry.EID=ldap_desc.DEID AND 1 dap_desc . AEID=<id>) 
AND 

ldap_entry.EID NOT IN 

((SELECT EID FROM fl where fl='vl'))) 



Figure 1 8 SQL statement generated for complex query 



Filter String: 

(&( I (objectclass=PERSON)(objectclass=GROUP))(sn=SMITH)(! (members 

SQL Statement: 

SELECT entry. EntryData , 

FROM ldap_entry as entry WHERE entry. EID in 

( SELECT distinct ldap_entry.EID FROM ldap_entry,ldap_desc 

WHERE 

(ldap_entry.EID=ldap_desc.DEID AND ldap_desc.AEID=?) 
AND 

ldap_entry.EID IN 

(((SELECT EID FROM OBJECTCLASS WHERE OBJECTCLASS = PERSON) 
UNION 

(SELECT EID FROM OBJECTCLASS WHERE OBJECTCLASS = GROUP) ) 
INTERSECT 

(SELECT EID FROM SN WHERE SN = SMITH) 
INTERSECT 

(SELECT EID FROM ldap_entry WHERE EID NOT IN 
(SELECT EID FROM MEMBER)))) 
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plete the operation. This security concern is ad- 
dressed by the access control model. 

In designing the access control model, the primary 
goals were to create a simple, secure, and highly ef- 
ficient means of protecting information stored within 
the LDAP directory. Using access control lists (ACLs), 
administrators can be assured that access to direc- 
tory information is secure. 

Administrators and users can view and update di- 
rectory information through the LDAP protocol. The 
same is true for access control information (aci). In 
order to use the standard ldap operations to man- 
age security, access control information is stored in 
attributes. By allowing updates to access control lists 
through the LDAP protocol, administrators are able 
to use a single protocol for all directory-related op- 
erations. The rest of this section provides the details 
of the access control mechanism that protects direc- 
tory information. 

Access control attributes. Each entry in the direc- 
tory has an associated set of access control informa- 
tion. This information is stored in seven different 
attributes: aclEntry, entryOwner, aclPropagate, 
ownerPropagate, ownerSource, aclSource, and in- 
heritOnCreate. 

The attributes aclEntry and entryOwner describe the 
permissions given to particular subjects. The en- 
tryOwner attribute is multivalued, consisting of dis- 
tinguished names (DNs) that are given full authority 
on an entry. These DNs are considered administra- 
tors for their corresponding entry. The aclEntry at- 
tribute is a multivalued attribute describing the user 
and group DNs that have been given privileges to per- 
form particular operations on an entry. The infor- 
mation is associated with an entry in the LDAP di- 
rectory. When an operation is initiated against that 
directory entry, the aclEntry and entryOwner are 
checked to determine whether the subject has the 
required permission to perform the requested op- 
eration on that particular directory entry. 

Placing an ACL on each individual directory entry 
would be a very time-consuming exercise and would 
be difficult, if not impossible, to administer. There- 
fore, it is necessary to use a sparse ACL model. This 
means that not every node in the directory must have 
an ACL. Instead, ACLs are strategically placed in the 
tree and apply to entire portions of the tree instead 
of just a single node. 



Although each directory entry has a set of ACL at- 
tributes that describe the security characteristics as- 
sociated with it, it is not necessary that these at- 
tributes must be explicitly placed on the directory 
entry. ACL entries and entry owners set on a specific 
node can propagate to descendant entries. The ac- 
cess control policy can be managed on a per-direc- 
tory entry basis, but it is also possible to manage en- 
tries in large portions of the tree with only a single 
ACL. 

ACLs and an entry owner can be set to apply to just 
a particular entry, or an entry and its entire subtree. 
Although both entry owners and ACL entries can 
propagate, their propagation is not linked. 

Entries on which an ACL has been placed are con- 
sidered to have an "explicit" ACL. Similarly, if an en- 
try owner has been set on a particular entry, that en- 
try has an "explicit" owner. Since the two are not 
intertwined, an entry with an explicit owner may or 
may not have an explicit ACL, and an entry with an 
explicit acl may or may not have an explicit owner. 
If these values are not explicitly present on an entry, 
the values are inherited from an ancestor node in 
the tree. 

Each explicit ACL and entry owner applies to the en- 
try on which it is set. Additionally, the values may 
apply to all descendants that do not have an explic- 
itly set value. These values are considered "propa- 
gated"; their values propagate through the directory 
tree. Propagation of a particular value continues un- 
til another propagating value is reached. If an ACL 
(or entry owner) does not propagate, it is consid- 
ered an "override" ACL (or entry owner) which ap- 
plies only to the directory entry on which it is set. 

The ACL of a directory entry can therefore be con- 
ceptually determined by the following algorithm: "Is 
there an explicit ACL set at the directory entry?" If 
yes, then that is the ACL of the directory entry. If no, 
then traverse the tree backwards until an ancestor 
node is reached with a propagate ACL. If no node 
is found with a propagate ACL, then only the entry 
owner and the administrator will be granted access 
to the directory entry. 

The inheritOnCreate attribute ensures that direc- 
tory entries are secure when they are created. At di- 
rectory entry creation time, the creator may wish to 
specify attribute values for the access-control-related 
attributes. Whether or not the owner may specify ACL 
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and ownership properties is determined by the par- 
ent directory entry's owner "inherit on create" flag. 

Subjects. Subjects are comprised of an (ldap) dis- 
tinguished name and a privilege attribute type and 
represent a directory entry. The privilege attribute 
type describes whether the DN is an "access-id," a 
"group," or a "role." These different types enable 
access to be granted on both a per-user granularity 
as well as by group or role membership. 

When the user authenticates to the directory at bind 
time, the subject information is retrieved. The DN 
used during authentication becomes the "bind DN." 
Additionally, group and role membership is deter- 
mined and added to the list of credentials for the 
authenticated user. These credentials are later used 
by the access determination functions. 

Two additional DNs have been created and are used 
as pseudosubjects. These pseudosubjects are not rep- 
resented by an entry in the directory but can be used 
in an ACL. They are used to refer to large numbers 
of DNs that share a common characteristic at bind 
time in relation to either the operation being per- 
formed or the directory entry on which the opera- 
tion is being performed. 

The first pseudo-DN is the group "cn=Anybody." 
When specified as part of an ACL, this group refers 
to all users. Users cannot be removed from this 
group, and this group cannot be removed from the 
database. "cn= Anybody" is considered to be the 
group of all unauthenticated users or any user that 
does not have a specific ACL on a directory entry. 

The second pseudo-DN is the access-id "cn=this." 
When specified as part of an ACL, it grants permis- 
sions when the bind DN matches the directory entry 
DN on which the operation is performed. If an 
operation is performed on the directory entry 
"cn=personA, ou=lBM, c=us," permissions asso- 
ciated with the access-id "cn=self ' would be granted 
when the bind DN is "cn=personA, ou=lBM, 
c=us." 

Permissions. The permission set is derived from the 
six basic LDAP operations that maybe performed on 
a directory entry. The permissions are: add a direc- 
tory entry, delete a directory entry, read an attribute 
value, write an attribute value, search for an attribute, 
and compare an attribute value. Each permission is 
discrete; one permission does not imply another. 



Table 1 Permissions for LDAP operations 



Operation Permission Needed 




Idapaikl 


add (on parent entry) 


Idapdclete 


delete (on entry) 


ldapmodify 


write (on attribute of entry) 


Idapseurch: return 


seareh (on attribute) 


attribute names 




Idapseareh: return 


search, read (on attribute) 


attributes and value 




Idapmodrdn 


write (on attribute) 


ldapcompare 


compare (on attribute) 



Table 1 lists the permissions needed to perform each 
of the LDAP operations. 

Add and delete permissions apply to an entire di- 
rectory entry. Read, write, search, and compare ap- 
ply to the attributes within the directory entry. 

Attribute access classes. It is likely that many at- 
tributes will require the same type of protection. It 
is therefore useful to coarsen the access policy gran- 
ularity by grouping attributes with similar access sen- 
sitivities. This action reduces the number of ACLs 
within the directory and greatly simplifies adminis- 
tration. 

Attributes are grouped together in "attribute access 
classes." Within the schema file, attributes are 
mapped to an access control class. Each class is dis- 
crete; access to one attribute class does not imply 
access to another class. 

Instead of specifying that a subject has access to an 
attribute, the administrator gives a subject permis- 
sion to an access class. This action grants the subject 
the specified permission to all attributes within that 
access class. 

There are three access classes to which users can as- 
sign user modifiable attributes: normal, sensitive, and 
critical. Two additional attribute classes handle the 
special-case attributes. The restricted attribute class 
contains all of the ACL attributes, and the system at- 
tribute class contains all of the non-user-modifiable 
operational attributes. 

Performance measurement 

DB2 is a very powerful relational database product 
with a wide assortment of tools and tuning param- 
eters that allow us to tailor our use of DB2 for op- 
timum performance. 



IBM SYSTEMS JOURNAL, VOL 39, NO 2, 2000 



SHI ET AL. 377 



Figure 1 9 Base directory structure 
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Improvements. The following areas were key im- 
provements in our LDAP/DB2 implementation with 
respect to performance: 

• SQL query tuning. We found that although two sim- 
ilar queries could produce the same results in an 
entry-level directory (i.e., f OK entries), the tuned 
query performed much better when the directory 
was scaled to larger levels (i.e., 100K — » 1M). The 
DB2 explain tool was used to tune our queries. 

• Database indices. We also found that a single in- 
dex on some of our tables did not scale well. We 
used a combined index for each LDAP attribute that 
was searchable, as well as a secondary single index 
for queries involving only the EID. The DB2 re- 
orgchk was used to analyze the effectiveness of our 
indices and indicate when tables needed to be re- 
organized for better performance. 

• Database partitioning. We found that splitting the 
database across multiple containers located on sep- 
arate drives extended our scalability (avoiding op- 
erating system file size limitations) in addition 
to improving our search performance. The DB2 
backup and redirected restore commands were 
used to experiment with different combinations of 
database partitioning and their effects on perfor- 
mance. 

• Referral caching. We found that caching referral 
information at process startup time improved 
search performance because the information was 
already available when needed. 



• Database and database manager tuning. We found 
the DB2 database monitor extremely useful for tun- 
ing our DB2 parameters for our specific directory 
contents. 

• Server caching. We added a filter cache and an en- 
try cache so that we could quickly determine 
whether a search had been done before, and if so, 
quickly return the results from memory instead of 
going to the database . We also added an ACL cache. 

• Substring search improvements. We added a re- 
verse index column in our attribute table to use 
when processing substring searches with a wild- 
card character "*" at the beginning. 

Benchmark description. Because of a lack of a stan- 
dard benchmark at the time, we created our own for 
internal development use and user-level competitive 
assessment. Our ldap benchmark performs assorted 
ldapsearch queries for various entries contained in 
a directory. For this experiment, we used actual data 
from IBM's Callup (internal telephone) directory. A 
base of 100K (101498) unique entries was created 
by combining 12 IBM U.S. directories. Figure 19 il- 
lustrates our base directory structure. 

Then, to scale up to 1 million entries, we made 10 
copies of the data with an additional "ou" level added 
for uniqueness (Table 2). For example, if the first 
100K contained Dave Bachmann in Austin, the sec- 
ond 100K would contain Dave Bachmann in Aus- 
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Figure 20 First time response time 




Table 2 Data copies for 1 million entries 



Number of Entries Respective DNs 



100K dn: <.n=Dave Bachmann, ou = Austin, o=IBMJJS, c=US 

2(I0K tin: cn = l)ave ISachinann, on = .\usiin I , ou = Auslin, o = IBM US, c=US 

JUIftK dn: cn = Dave liachniann, ou=Austin2, ou=Austin, o=lBM_US, c=US 



900K dn: cn=Dave Bachmann, ou=Austin8, ou=Austin, o=IBM US, c=US 

1M dn:cn Dave Badimann, ou = Austin9, ou = Austin, o = IBM JUS, c US 



tinl, and the last 100K would contain Dave Bach- 
mann in Austin9. 

Each entry in the directory contains three required 
attributes and up to 43 additional attributes. On av- 
erage, there are approximately 20 attributes per en- 
try. We specified four indexed attributes in our LDAP 
configuration file (sn, cn, ou, telephone) and per- 
formed assorted ldapsearch operations with differ- 
ent filters and search scopes (base, one level, and 
subtree) on these attributes. 

The hardware and software used for the experiment 
was as follows: 



• Client— RS/6000* Model 60f5 (60f ,66 MHz), 98 
MB of memory, 1.5 GB DASD, 16 Mbps Token 
Ring, SPECint_base95 = 1.69, AIXv4.2.1, IBM 
eNetwork Directory Service v2.1 client 

• Server— RS/6000 Model E20, 256 MB of memory, 
1X2.2 GB and 4x4 GB disks, 16 Mbps Token 
Ring, SPECint_base95 = 3.43, AIXv4.3, IBM eNet- 
work Directory Service v2.1, DB2 v5 

Benchmark results. Figures 20-23 illustrate our 
benchmark results. We have separated them into 
two categories: the first ldapsearch response times 
and the cached ldapsearch response times. First 
ldapsearch response times are important for appli- 
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Figure 21 Cache response time 




' ■■ & J(: IV/O! k l>Uil ... ... : 



cations and users performing random searches in the 
directory. A good example where first time searches 
are important would be a white pages directory. 
Cached ldapsearch response times are important for 
applications and users performing repeated searches 
in the directory. A good example where cached 
searches are important would be a public key infra- 
structure that stores certificates in the LDAP direc- 
tory. For our benchmark results, we compared the 
AIX IBM eNetwork Directory v2.1 product to a com- 
petitive LDAP server (nonrelational) running on the 
same hardware. 

For first time equality searches, the IBM average re- 
sponse time was four times better than the compet- 
itor we studied, as illustrated in Figure 20. Further 
analysis showing the distribution of response times 
with respect to the number of entries returned from 
the ldapsearch is shown in Figure 22. The compet- 
itor did not do well for any equality searches that 
returned more than one or two entries. For equality 
searches that returned more entries, the IBM re- 
sponse time increased only slightly and performed 
consistently well. We believe that this result shows 
the advantage of using a relational database for a 
directory data store. Although there is a certain 



amount of overhead associated with using a full-func- 
tion database, its advantage becomes evident when 
returning large amounts of data. 

For first time substring searches, response time var- 
ied depending on the location of the wild-card char- 
acter "'*." In a previous release, we found that we 
did not do well at all for this character at the be- 
ginning; we have since implemented a solution that 
performs much better. For searches with the char- 
acter at the end or middle, we performed very well, 
beating our competitor significantly for every search 
in this category. 

For Boolean first time queries (queries with AND and 
or), the IBM average response time was seven times 
faster than that of the competitor. Further analysis 
of the searches showed that IBM did much better for 
searches that used the same attributes. For exam- 
ple, "(|(sn=Corn)(sn=Jones))" performed better 
than one that used two different attributes such as 
"(&(sn= Corn) (division = 1 1))." This results from our 
attributes being stored in separate database tables. 
Similar to the equality results, IBM did much better 
when multiple results were returned. We did not in- 
clude queries with the NOT operator in Figure 20 be- 
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Figure 22 LDAP first response time distribution 
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Figure 23 LDAP cache response time distribution 
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cause the competitive product takes a very long time 
to return any data. Our server returns correct data 
in seconds. 

For cache searches, both products performed very 
quickly, regardless of the type of search being per- 
formed. This result is expected, since the data are 
being retrieved from memory instead of from the da- 
tabase. Further analysis showing the distribution of 
response times with respect to the number of cached 
entries returned from the ldapsearch is shown in Fig- 
ure 23. For both products, the response time in- 
creased as the number of entries returned increased. 
Based on the data we have, the slope of the increase 
was twice as much for our competitor. For applica- 
tions and users returning multiple entries, the IBM 
eNetwork Directory would be faster. We also exper- 
imented with committing and not committing SQL 
operations for search. We did not observe any per- 
formance differences. 

Performance outlook. There are many more perfor- 
mance enhancements to our LDAP implementation 
that can further improve performance. Some exam- 
ples are further caching improvements, table reduc- 
tion, modification improvements, and lock reduction. 



Conclusions 

The design and implementation of an LDAP direc- 
tory service using the IBM DB2 relational database as 
a data store and query engine has been examined. 
It has been shown that although the mapping of the 
LDAP hierarchical model to a set of relational tables 
(database schema) is anontrivial task, a well-thought- 
out (yet simple) set of relational tables and use of 
SQL produces a secure directory service with com- 
petitive performance. Use of a relational database 
also provides: 

• An easy growth path for a highly scalable direc- 
tory server whose (1) database size on a single uni- 
processor machine is not limited by the maximum 
file size, (2) client/server configuration allows for 
a network dispatcher to route requests among the 
LDAP servers for scaling to very large numbers of 
users while maintaining performance, and (3) par- 
allel edition can automatically partition the data- 
base across multiple machines to support even 
more users and larger directories 

• A robust data store whose transaction capability 
gives the LDAP server protection from hardware 
and software failures while maintaining the integ- 



rity of the directory information (no corrupted di- 
rectory) 

• Alternative administrative capabilities in areas that 
are not yet standardized by the LDAP such as ac- 
cess control, on-line backup and restore, replica- 
tion, and a fast bulk load facility 

• A powerful query processing engine to securely 
handle simple to complex queries using the well- 
known and powerful SQL language 

Acknowledgments 

Rod Mancisidor has inspired many of the ideas that 
we have presented in this paper. Larry Fichtner, 
Chin-Long Shu, Mark McConaughy, and Trung Tran 
also contributed greatly to the design and implemen- 
tation. We would like to thank Dimitri Milionis and 
Bill Wilkins for providing tremendous help and ad- 
vice with respect to DB2 performance and technical 
issues. We would also like to thank the management 
team (Reggie Hill, Sharal Brown, Dave Dyar, and 
Terry Dunkle) who provided us with strong support 
to deliver this product within a tight schedule. 

Trademark or registered trademark of International Business 
Machines Corporation. 

"Trademark or registered trademark of Novell, Inc., Netscape 
Communications Corporation, Lotus Development Corporation, 
or QUALCOMM, Inc. 

Cited references 

1. T. Howes and M. Smith, LDAP: Programming Directory-En - 
• -t, nswith Lightweight Dii . i 

ISBN 1-57870-000-0, Macmillan Technical Publishing, New 
York (1997). 

2. S. Shi, E. Stokes, C. Corn, D. Bachmann, L. Jones, and 
S. Pasha, "Exploiting Relational Database Technologies with 
LDAP Directory Service," Proceedings of the International 
Conference on Advanced Science and Technology ( 1 998 ), pp. 
222-227. 

3. The SLAPD and SLURPD Administrator's Guide, Release 33, 
I fn i\ ci sit\ of Michigan, Ann Arbor, MI (.April 30, 1996). 

4. W. Yeong, i, l!o in 1 Kill / ■ n R • ' vA 
cess Protocol, RFC 1777, Internet Engineering Task Force 
(March 1995); available at http://www.ietf.org/rfc/. 

5. T. Howes and M. Smith, The LDAP Application Program In- 
icrfin e, lU-'i ' 1823, 1 nlcrncl f'ngineering Task force (August 
1995); available at http://www.ietf.org/rfc/. 

6. S. Killed String Representation of Distinguished Names, Rl C 
f 779, Internet Engineering Task Force (March 1995); avail- 
able at http://www.ietf.org/rfc/. 

7. T. Howes, S. Kille, W. Yeong, and C. Robbins, The String 
Rcptet i tu 1 I S> RFC 1778, In- 
ternet Engineering Task Force (March 1995); available at 
http://www.ietf.org/rfc/. 

8. T. A. Howes and M. C. Smith, A Scalable, Deployable Direc- 
tory Service Framework for the Internet, CITI Technical Re- 
port 95-7, University of Michigan, Ann Arbor, MI (April 
1995). 



382 SHI ET AL. 



IBM SYSTEMS JOURNAL, VOL 39, NO 2, 2000 



9. DATABASE 2, Application Programming Guide for Common 
Servers, S20H-4643-01, IBM Corporation. 

10. J. J. Ordille and B. P. Miller, "Nomenclature Descriptive 
Query Oplinii/alion lor I arge X.5I K) 1 ■ n\ iron men Is," Pn,, 'ceil- 
ings of the 1991 SIGCOMM Conference (1991), pp. 301-314. 

11. D. Barrowman and P. Martin, "The Performance of SQL 
Queries in X.500 Directory System," Computer Communi- 
cations 21, 133-146 (1998). 

12. ( Oracle 7 Server SQL Reference Manual, Oracle Corporation, 
see http://technet.oracle.com/docs/products/oracle7/ 
doc_index.htm. 

13. Maintaining Database* by Means of Hierarchical Genealogi- 
cal Table, United States Patent. No. -4074"! (November 14, 
1995). 

14. S. S. B. Shi, L. G. Fichtner, R. A. Mancisidor, and C. Corn, 
An Eft era. Impiei i ' i > Directory Access 
Protocol (LDAP) Search Queries with SQL, filed as Docket 
AT998183 (1998). 

15. S. S. B. Shi, L. G. Fichtner, R. A. Mancisidor, and C. Corn, 
1/ iii 1 .DA 1 ' ii 
bles, filed as Docket AT998106 (1998). 

16. J. Teuhola, "An Efficient Relational Implementation of Re- 
cursive Relationships Using Path Signatures," The 10th In- 
ternational Conference on Data Engineering, Houston, Texas 
(February 1994), pp. 446 -454. 

17. R. Agrawal and H. V. Jagadish, "Direct Algorithms for Com- 
puting the Transitive Closure of Database Relations," Pro- 
ceedings of the 13th VLDB Conference, Brighton, England 
(1987), pp. 255-266. 

18. "An Amateur's Introduction to Recursive Query Processing 
Strategies," I'm, veilings of. U M SH MOD Conference, Wash- 
ington, DC (1986), pp. 16-52. 

19. P. Ciaccia, D. Maio, and P. Tiberjo, "A Method for Hier- 
archy Processing in Relational Databases," Information Sys- 
tems 14, No. 3, 93-105 (1989). 

20. E. H. Herrin and R. A. Finkel, "Schema and Tuple Trees: 
An Intuitive Structure for Representing Relational Data," 
Computing Systems: The lournal of the USENIX Association 
9, No. 2, 93-118 (1996). 

21. T. A. Howes, The Lightweight Directory Access Protocol: X.500 
Lite, CITI Technical Report 95-8, University of Michigan, 
Ann Arbor, MI (July 27, 1995). 

22. I Howes, A Sn liepivst i, LDAP 1 ers, RFC 1960, 
Internet Engineering Task Force (June 1996); available at 
http://www.ietf.org/rfc/. 

Accepted for publication September 15, 1999. 



Debora Byrne IBM Network Computing Software Division, 11400 
Unmet Road, Austin, Texas 78758 lelectronic mail: djbyrne@us. 
ibm.com). Ms. Byrne is a staff software engineer and is team lead 
for the SecureWay Directory User Interface. She has a B.S. in 
computer science from Duke University and an M.S. in software 
engineering from the University of Houston — Clear Lake. 

Cindy Fleming Corn IBM Network Computing Software Di\ -is! on, 
11400 Burnet Road, Austin, Texas 78758 (electronic mail: ccorn® 
ns.ibm.eom/. Ms. Coi n is a staff software engineer and the IBM 
Directory Performance lead member. She has a B.S. in computer 
science and mathematics from the University of Illinois. 

David Bachmann IBM Network Computing Software Division, 
11400 Burnet Road, Austin, Texas 78758 (electronic mail: 

1 1 eh in i It i 
gineer and the team lead for Distributed Systems Performance. 
He has a B.S. in computer science and mathematics from Iowa 
State University, and an M.S. and Ph.D. in computer science and 
engineering from the University of Michigan. 

Tom Jones Innosoft International, Inc., 8911 Capital of Texas 
Highway. Suite 4140, Austin, Texas 78759 (electronic mail: 
tom.jones(trinnosoft A oin •. Mr. Jones was a software engineer and 
member of the SecureWay Directory Performance team at the 
IBM Network < ompuling Software Di\ ision in Austin bed >re join- 
ing Innosoft. He has eighteen years of computer science expe- 
rience, the last five of which have been in performance analysis 
for distributed systems. Mr Jones is currently a senior quality as- 
surance engineer at Innosoft International, lie is responsible for 
product testing of the Innosoft Distributed Directory Server, the 
Innosoft DirectoryPortal, and the Innosoft LDAP Proxy Server, 
with much of his work including LDAP performance testing. 



Shepherd S. B. Shi IBM Network Computing Software Division, 
11400 Burnet Road, Austin, Texas 78758 (electronic mail: 
i Di Shi i v hi i il i i engineei tid ehi I 

developer for IBM eNetwork Directory Services. He has a B.S. 
in computer science from National Taiwan University, an M.S. 
in computer science from Stanford University, and a Ph.D. in com- 
puter science from the University of Illinois. 



Ellen Stokes IBM Network Computing Software Division, 11400 
Burnet Road, Austin, Texas 78758 (electronic mail: ejstokes@us. 
ibm.com j. Ms. Stokes is a senior technical staff member and the 
lead architect for Directory Services across IBM. She has aB.S.E. 
in computer engineering and an M.S.E. in computer, informa- 
tion, and control engineering from the University of Michigan. 



IBM SYSTEMS JOURNAL, VOL 39, NO 2, 2000 



SHI ET AL. 



